Performance and Energy Aware Workload Partitioning on Heterogeneous Platforms
نویسنده
چکیده
Heterogeneous platforms which employ a mix of CPUs and accelerators such as GPUs have been widely used in the high-performance computing area [1]. Such heterogeneous platforms have the potential to offer higher performance at lower energy cost than homogeneous platforms. However, it is rather challenging to actually achieve the high performance and energy efficiency promised by heterogeneous platforms. The main difficulty is that the processors in heterogeneous systems usually feature distinct characteristics, which is not presented in homogeneous platforms. This difficulty brings two main challenges that prevent achieving the promised performance and energy efficiency. One main challenge is the efficient utilization of the different types of processors in heterogeneous platforms. Many studies [2], [3], [4] have been done to increase the processor utilization of heterogeneous platforms. However, this type of work assumes that workload has already been implemented or requires online profiling information. The second main challenge is the large cost in design time to partition workload for heterogeneous platform. Numerous efforts [5], [6], [7] have been made in automatic workload partitioning on heterogeneous platforms. However, these efforts only consider workload partitioning via data partitioning (DP). Workload here refers to the amount of computation (measured by flops) and memory traffic (measured by bytes) to be executed at the algorithmic level. Judicious code partitioning (CP) between distinct processors has been shown to be able to achieve better performance/energy than DP for running workloads on heterogenous platforms [8]. An example is shown in Figures 1 and 2. The CP-based implementation maps the vector addition and power operations in all iterations onto the CPU and GPU, respectively. The DP-based implementation maps the iterations (i.e., dataset size) evenly onto the CPU and GPU based on the CPU and GPU performance. The results of running these two implementations on two heterogeneous platforms indicate that CP can have better performance than DP but using different combinations of CPU and GPU also impacts this observation. To help developers consider both DP and CP for workload partitioning in design time with affordable cost, my thesis research aims to develop a lightweight tool to help developers partition workload and select appropriate workload partition (WP).
منابع مشابه
Energy-Aware Task Partitioning on Heterogeneous Multiprocessor Platforms
Efficient task partitioning plays a crucial role in achieving high performance at multiprocessor platforms. This paper addresses the problem of energy-aware static partitioning of periodic realtime tasks on heterogeneous multiprocessor platforms. A Particle Swarm Optimization variant based on Min-min technique for task partitioning is proposed. The proposed approach aims to minimize the overall...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملOptimization of Data-Parallel Scientific Applications on Highly Heterogeneous Modern HPC Platforms
Over the past decade, the design of microprocessors has been shifting to a new model where the microprocessor has multiple homogeneous processing units, aka cores, as a result of heat dissipation and energy consumption issues. Meanwhile, the demand for heterogeneity increases in computing systems due to the need for high performance computing in recent years. The current trend in gaining high c...
متن کاملDevice-Aware Cache Management based on Adaptive Replacement
Heterogeneous devices have been adopted widely in mobile storage systems because a combination of such devices can supply a synergistically useful storage solution by taking advantage of each device. In heterogeneous storage systems there have been several researches for enhancing I/O performance by devising proper buffer cache management algorithms. This paper presents a novel device-aware buf...
متن کاملccomodating Diversity in CMPs with Heterogeneous Frequencies
Shrinking process technologies and growing chip sizes have profound effects on process variation. This leads to Chip Multiprocessors where not all cores operate at maximum frequency. Instead of simply disabling these slower cores or using guard banding (running all at the frequency of the slowest logic block), we investigate keeping them active, and examine the performance and power efficiency ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016